R Spark basics

This notebook shows a few very simple steps with Spark R



In [1]:

    
# Load the SparkR package. 
# It will likely show a few warnings about functions that the package overrides
library(SparkR)









    



Attaching package: 'SparkR'

The following objects are masked from 'package:stats':

    cov, filter, lag, na.omit, predict, sd, var

The following objects are masked from 'package:base':

    colnames, colnames<-, intersect, rank, rbind, sample, subset,
    summary, table, transform



In [4]:

    
# In the IRkernel we do not have an automatically created Spark Context, as in Python & Scala. 
# We need to initialize the kernel to fetch one. That takes a few moments.
sc <- sparkR.init( "local[*]" );









    



Re-using existing Spark Context. Please stop SparkR with sparkR.stop() or restart R to create a new Spark Context



In [5]:

    
# Once we have it, we can also obtain an SQL context
sqlContext <- sparkRSQL.init(sc)



In [6]:

    
# Do something to prove it works

# Load one of the standard datasets that come pre-packaged with R
data(iris)

# Turn the dataset into an SparkR DataFrame
df <- createDataFrame(sqlContext, iris)

# Inspect it
head( filter(df, df$Petal_Width > 0.2) )









    



Warning message:
In FUN(X[[i]], ...): Use Sepal_Length instead of Sepal.Length  as column nameWarning message:
In FUN(X[[i]], ...): Use Sepal_Width instead of Sepal.Width  as column nameWarning message:
In FUN(X[[i]], ...): Use Petal_Length instead of Petal.Length  as column nameWarning message:
In FUN(X[[i]], ...): Use Petal_Width instead of Petal.Width  as column name





    Out[6]:





Sepal_Length Sepal_Width Petal_Length Petal_Width Species

	1 5.4 3.9 1.7 0.4 setosa
	2 4.6 3.4 1.4 0.3 setosa
	3 5.7 4.4 1.5 0.4 setosa
	4 5.4 3.9 1.3 0.4 setosa
	5 5.1 3.5 1.4 0.3 setosa
	6 5.7 3.8 1.7 0.3 setosa



In [ ]:

    
# sc is an existing SparkContext.
# hiveContext <- sparkRHive.init(sc)



In [ ]:

	Sepal_Length	Sepal_Width	Petal_Length	Petal_Width	Species
1	5.4	3.9	1.7	0.4	setosa
2	4.6	3.4	1.4	0.3	setosa
3	5.7	4.4	1.5	0.4	setosa
4	5.4	3.9	1.3	0.4	setosa
5	5.1	3.5	1.4	0.3	setosa
6	5.7	3.8	1.7	0.3	setosa